Skip to content

Propose R wordbuilder MVP#96

Closed
kiki830621 wants to merge 1 commit into
mainfrom
codex/r-word-builder-mvp-spec
Closed

Propose R wordbuilder MVP#96
kiki830621 wants to merge 1 commit into
mainfrom
codex/r-word-builder-mvp-spec

Conversation

@kiki830621
Copy link
Copy Markdown
Collaborator

Summary

Validation

  • spectra analyze r-word-builder-mvp --json
  • spectra validate r-word-builder-mvp
  • git diff --check -- openspec/changes/r-word-builder-mvp

Refs #88

@kiki830621
Copy link
Copy Markdown
Collaborator Author

Verify Report — PR #96

Engine

6-AI ensemble: 5 general-purpose Agents (Claude reviewers) + Codex (gpt-5.5 xhigh). All 5/5 reviewer findings present + non-empty on first attempt (no Recovery Protocol invoked this round). Codex completed.

Aggregate

NEEDS WORK — Spectra structure is valid (spectra validate + analyze --json pass per Codex), but multiple BLOCKERs converge across reviewers + Codex + Devil's Advocate. The most severe is a CRITICAL spec-level security gap: R→Swift code injection model is completely absent from the spec.

Scope coverage


#88 — R package that emits WordBuilderSwift scripts

Requirements coverage (Codex + Requirements + Logic + Regression consensus):

#88 Requirement Status
R package / external PsychQuant repo / naming FULLY
word_document / add_heading / add_paragraph / add_text_run / add_table / render FULLY
R state → runnable Swift importing WordBuilderSwift PARTIALLY
swift run shell-out + return .docx path PARTIALLY
Retain .swift artifact as committable FULLY (but undermined by absolute paths — see #4 below)
Persistent Swift package cache FULLY at proposal level (PARTIALLY at detail)
character / data.frame / list mapping PARTIALLY (list of character → mixed runs MISSING)
ggplot2 FULLY re-scoped (documented)
model summary NOT addressed / undocumented re-scope
Golden + integration test strategy PARTIALLY
Vignette NOT addressed
Security / injection model NOT addressed

Findings (merged + deduplicated across 6 sources)

# Severity Finding Source Action
1 CRITICAL R→Swift code injection model entirely absent from spec. R user-controlled strings (data.frame cells, column names, headings, paragraphs) flow straight into generated Swift source. No normative escaping requirement. Payloads like "; import Foundation; FileManager.default.removeItem(...) or string interpolation \(...) can escape the literal. tasks.md mentions "escaping tests" but that's a task line, not a Spec Requirement. Need: dedicated Safe-Swift-Source-Escaping Requirement + adversarial fixtures (quote, backslash, interpolation, multi-line, unicode, Swift keyword as column name). agents:security+codex+devils-advocate Spec must add normative Requirement before merge
2 CRITICAL Retained .swift script contains absolute paths (cache dir, image paths). Spec declares it as "committable / reviewable / diffable" — the PR's main product differentiator vs officer/rmarkdown. But absolute paths make commit-portability impossible. Two valid R users on different machines produce different .swift outputs for same R input. Either (a) spec lacks path-canonicalization Requirement, or (b) the commit-portability claim is over-stated. agents:devils-advocate Spec must reconcile: add canonicalization OR remove portability claim
3 HIGH list of character → mixed runs is in #88's Phase 1 scope (issue body lines 69-81) but PR has zero spec coverage and is not declared a Non-Goal. Silent re-scope. agents:requirements+codex Add Spec Requirement OR Non-Goal entry
4 HIGH model summary in #88's Phase 1 scope. Zero spec coverage. Undocumented re-scope. agents:codex+devils-advocate Add Spec Requirement OR Non-Goal entry
5 HIGH Spec.md normatively names 5 WordBuilderSwift types (Document/Section/Paragraph/TextRun/Table). Any future type rename in word-builder-swift breaks R spec. No abstraction layer. agents:regression Either pin to current type names with version OR abstract via R-side adapter
6 HIGH Phase 1 single-section limitation in word-builder-swift NOT propagated to R spec. R API allows multi-section semantics that silently get truncated by writer (emit only first section). Silent data loss. agents:regression Spec must propagate constraint as R-side normative
7 HIGH Cache concurrency / poisoning model undefined. No file lock for parallel render() calls. Cache key includes only word-builder-swift version, not generated swift hash → stale cached binary collisions. No GC mechanism. No detection of stale .build/release/Runner binary poisoning. agents:security+logic Spec Section "Cache lifecycle and concurrency"
8 HIGH SwiftPM lockfile missing from spec. "Pinned word-builder-swift version" ≠ deterministic resolution. Need normative requirement that Package.resolved ships with the R package, otherwise transitive deps re-resolve on every cache init (typosquat / hijack window). agents:security Spec Requirement: ship pinned Package.resolved
9 HIGH swift run trust model undefined. Spec says "write Swift into cache or copy into runner target" — the "or" is a red flag. No directory permissions specified, no hash verification of Package.swift. Cache poisoning trivial. agents:security Spec normative trust model
10 HIGH macOS-only constraint (because WordBuilderSwift is macOS Swift) excludes ~70% of academic R audience (Linux/Windows-heavy stats labs). PR proceeds as if this is acceptable but doesn't analyze the audience reach impact. agents:devils-advocate Document explicitly OR pivot to cross-platform Swift toolchain
11 HIGH "Byte-identical Swift output" is declared SHALL but not enforceable — R has hash-randomized list iteration, locale-dependent string ordering, data.frame row.names default behavior, float-to-string locale variance. Spec lists no canonicalization rules (no UTF-8 fixing, no newline normalization, no field ordering, NA/NaN/Inf formatting, factor/date/numeric formatting, locale/timezone freezing). agents:codex+devils-advocate+logic Spec canonicalization section
12 HIGH Distribution channel completely missing. remotes::install_github vs R-universe vs CRAN — spec doesn't pick. Affects audience reach, version pinning model, and update flow. agents:devils-advocate Pick + document
13 MEDIUM Toolchain preflight only covers "swift binary missing". Needs: wrong version, no internet for SPM, Xcode CLT missing, swift package resolve fails, build fails, executable exits non-zero, stderr/stdout passthrough, cache directory unwritable. agents:codex+logic Expand preflight Requirement
14 MEDIUM R env vars leak to swift run subprocess (OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY etc.). Combined with code injection vector (#1) = exfiltration via ProcessInfo.environment. Default behavior should be scrubbed env. agents:security Spec normative env policy
15 MEDIUM "Committable .swift artifact" contains R data verbatim. PII / PHI leak vector. Spec actively encourages committing without sensitivity warning. No opt-out (retain = FALSE / retain = "redacted") defined. agents:security Spec must warn + provide opt-out
16 MEDIUM add_text_run() semantics — WordBuilderSwift TextRun cannot directly attach to Document root; R API doesn't specify how to locate target paragraph. agents:logic Spec must define attachment semantics
17 MEDIUM Section handling — WBS requires ≥ 1 Section. R API implies single-section default but spec doesn't lock. agents:logic Spec single-section default
18 MEDIUM add_plot() appears in #88 example but neither spec nor proposal mention it. Need explicit Phase 2 deferral or scope inclusion. agents:logic Spec Non-Goal entry
19 MEDIUM bookmarks: design.md non-goal list includes; proposal.md doesn't. Internal inconsistency. agents:regression Reconcile
20 MEDIUM CI test strategy unclear — spec says "locally or periodically". No macOS runner setup, Swift/R/officer install policy, skip conditions, cache isolation, integration test triggers. Risk: golden files alone cannot prove generated Swift compiles. agents:codex+devils-advocate Spec CI section
21 LOW wordbuilder package name availability not checked (CRAN / R-universe collision risk). agents:devils-advocate Pre-flight check
22 LOW Future repo PsychQuant/r-word-builder existence not verified. agents:devils-advocate Verify or document plan
23 LOW tasks.md has trailing EOF line (cosmetic noise). agents:codex Strip
24 LOW Render result struct field names / R class not defined. agents:requirements Spec data shape

Scope Check

Spec proposal is in-scope for #88 MVP framing, but silent re-scopes on list-of-character + model summary (HIGH #3, #4) plus omission of audience-reach analysis (HIGH #10) constitute Issue-intent drift not explicitly acknowledged.

Security

Per CRITICAL findings #1 + #2 + HIGH #7-9, #14, #15: this proposal specifies an arbitrary-code-execution surface (R-controlled string → Swift code → swift run subprocess) without a normative threat model in spec. Implementation must retrofit security boundaries that would otherwise be impossible to add without breaking R caller API.

Process Gaps

None — all 5/5 reviewer findings produced on first attempt.


Recommendation

Do NOT merge. Severity now elevates above PR #94/#95: 2 CRITICAL (code injection + portability collapse) + 10 HIGH. Path A only.

Required Spec revisions before merge:

  1. Add normative Safe Swift Source Escaping Requirement with adversarial-input Scenarios (quote / backslash / interpolation / multi-line / unicode / Swift keyword as identifier)
  2. Resolve .swift portability — either add path canonicalization Requirement OR drop "committable" claim and rename it "ephemeral debug artifact"
  3. List-of-character + model-summary: add to Spec Requirements OR explicit Non-Goal with rationale
  4. Pin WBS type names with version in spec OR abstract via R-side adapter
  5. Propagate Phase 1 single-section limit to R Spec normatively
  6. Cache lifecycle: lock semantics, GC, hash-based key (not just version), stale binary detection
  7. Ship pinned Package.resolved requirement; document SwiftPM transitive resolution model
  8. Document swift run trust model: cache permissions, Package.swift hash verification
  9. macOS-only audience-reach analysis + Linux/Windows future-path note
  10. Determinism canonicalization rules: UTF-8/newline/field ordering/NA/NaN/Inf/factor/date/numeric/locale/timezone
  11. Distribution channel decision (install_github / R-universe / CRAN)
  12. Expand toolchain preflight beyond PATH check
  13. Env var scrub policy for swift run subprocess
  14. Sensitivity warning + opt-out (retain = FALSE) for committable artifact
  15. Reconcile bookmarks/add_plot inconsistencies
  16. CI test plan (macOS runner, dep install, skip conditions, cache isolation)
  17. Strip tasks.md trailing EOF noise

This is a larger revision request than #94 or #95 because the proposal claims a security-sensitive boundary it doesn't define.

@kiki830621
Copy link
Copy Markdown
Collaborator Author

Blocked pending spec revision per verify report. 2 CRITICAL: (1) R→Swift code injection model entirely absent from spec — R user-controlled strings flow straight to Swift source without normative escaping requirement; (2) retained .swift contains absolute paths, breaking the 'committable artifact' product differentiator. Full revision list at #96 (comment) (17 items).

@kiki830621
Copy link
Copy Markdown
Collaborator Author

Closing — Superseded by ooxml-edit-isomorphism-foundation (#99 merged in PR #106)

Per #99 ADR-009, this proposal is reframed as a Layer 4 (caller) front-end of the architectural foundation. The current spec has known gaps (see verify report findings) that the foundation's locked contract addresses naturally.

The re-framing work is tracked at #103 — Codex (or other authors) can open a new PR citing ooxml-edit-algebra capability spec + foundation design.md ADRs as inputs. The Codex work in this PR remains accessible in repo history (git show codex/<branch>) for reference when authoring the re-frame.

Issue #88 (original umbrella) was closed as absorbed into #99 architecture; #103 is the operational follow-up.

Refs #99 #88 #103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant